On Bijective Variants of the 
Burrows- Wheeler Transform 



Manfred Kufleitner 

Universitat Stuttgart, FMI, 
Universitatsstr. 38, 70569 Stuttgart, Germany 
kuf leitnerOf mi . uni-stuttgart . de 



Abstract. The sort transform (ST) is a modification of the Burrows- Wheeler transform 
(BWT). Both transformations map an arbitrary word of length n to a pair consisting of a 

■ word of length n and an index between 1 and n. The BWT sorts all rotation conjugates of 
the input word, whereas the ST of order k only uses the first k letters for sorting all such 
conjugates. If two conjugates start with the same prefix of length k, then the indices of the 

■ rotations are used for tie-breaking. Both transforms output the sequence of the last letters of 
^q' the sorted list and the index of the input within the sorted list. In this paper, we discuss a 
r—{ , bijective variant of the BWT (due to Scott), proving its correctness and relations to other re- 

■ suits due to Gessel and Reutenauer (1993) and Crochemore, Desarmenien, and Perrin (2005). 
Further, we present a novel bijective variant of the ST. 

m : 

C/3 ■ 1 Introduction 

q ; 

c/3 ! The Burrows- Wheeler transform (BWT) is a widely used preprocessing technique 

in lossless data compression [5]. It brings every word into a form which is likely to 
be easier to compress [IB] . Its compression performance is almost as good as PPM 
j> ■ (prediction by partial matching) schemes [7j while its speed is comparable to that 

of Lempel-Ziv algorithms [THlfTi] . Therefore, BWT based compression schemes are a 
! very reasonable trade-off between running time and compression ratio. 

In the classic setting, the BWT maps a word of length n to a word of length n and 
an index (comprising O(logn) bits). Thus, the BWT is not bijective and hence, it 
is introducing new redundancies to the data, which is cumbersome and undesired in 
applications of data compression or cryptography. Instead of using an index, a very 
>- ! common technique is to assume that the input has a unique end-of-string symbol 

[3lfl8] . Even though this often simplifies proofs or allows speeding up the algorithms, 
the use of an end-of-string symbol introduces new redundancies (again O(logn) bits 
are required for coding the end-of-string symbol). 

We discuss bijective versions of the BWT which are one-to-one correspondences 
between words of length n. In particular, no index and no end-of-string symbol is 
needed. Not only does bijectivity save a few bits, for example, it also increases data 
security when cryptographic procedures are involved; it is more natural and it can 
help us to understand the BWT even better. Moreover, the bijective variants give 
us new possibilities for enhancements; for example, in the bijective BWT different 
orders on the letters can be used for the two main stages. 

Several variants of the BWT have been introduced [2"]TT7] . An overview can be 
found in the textbook by Adjeroh, Bell, and Mukherjee [TJ. One particularly important 
variant for this paper is the sort transform (ST), which is also known under the name 
Schindler transform [22]. In the original paper, the inverse of the ST is described only 
very briefly. More precise descriptions and improved algorithms for the inverse of the 



ST have been proposed recently [T§1l20|l21] . As for the BWT, the ST also involves an 
index or an end-of-string symbol. In particular, the ST is not onto and it introduces 
new redundancies. 

The bijective BWT was discovered and first described by Scott (2007), but his 
exposition of the algorithm was somewhat cryptic, and was not appreciated as such. 
In particular, the fact that this transform is based on the Lyndon factorization went 
unnoticed by Scott. Gil and Scott [12] provided an accessible description of the algo- 
rithm. Here, we give an alternative description, a proof of its correctness, and more 
importantly, draw connections between Scott's algorithm and other results in combi- 
natorics on words. Further, this variation of the BWT is used to introduce techniques 
which are employed at the bijective sort transform, which makes the main contribu- 
tion of this paper. The forward transform of the bijective ST is rather easy, but we 
have to be very careful with some details. Compared with the inverse of the bijective 
BWT, the inverse of the bijective ST is more involved. 

Outline. The paper is organized as follows. In Section [2] we fix some notation 
and repeat basic facts about combinatorics on words. On our way to the bijective 
sort transform (Section EJ) we investigate the BWT (Section [3]), the bijective BWT 
(Section HJ), and the sort transform (Section EJ). We give full constructive proofs for 
the injectivity of the respective transforms. Each section ends with a running example 
which illustrates the respective concepts. Apart from basic combinatorics on words, 
the paper is completely self-contained. 

2 Preliminaries 

Throughout this paper we fix the finite non-empty alphabet E and assume that E 
is equipped with a linear order <. A word is a sequence a\ - • -a n of letters Oj G E, 
1 < % < n. The set of all such sequences is denoted by E* ; it is the free monoid over E 
with concatenation as composition and with the empty word e as neutral element. 
The set E + — E*\ {e} consists of all non-empty words. For words u, v we write u < v 
if u = v or if u is lexicographically smaller than v with respect to the order < on the 
letters. Let w = a\ ■ ■ ■ a n G E + be a non-empty word with letters G E. The length 
of w, denoted by \w\, is n. The empty word is the unique word of length 0. We can 
think of w as a labeled linear order: position i of w is labeled by G E and in this 
case we write X w (i) = a«, so each word w induces a labeling function X w . The first 
letter ai of w is denoted by first(w) while the last letter a n is denoted by last(w). 
The reversal of a word w is w = a n ■ ■ • a±. We say that two words u, v are conjugate 
if u = st and v = ts for some words s, t, i.e., u and v are cyclic shifts of one another. 
The j-fold concatenation of w with itself is denoted by wK A word u is a root of w 
if w = for some j G N. A word w is primitive if w = u? implies j = 1 and hence 
u — w, i.e., w has only the trivial root w. 

The right-shift of w = ai • • • a n is r(w) = a n a\ • • • a n _i and the i-fold right 
shift r l (w) is defined inductively by r°(w) = w and r l+1 (w) = r(r l (w)). We have 
r l {w) = a„_i+i ■ ■ -a n a\ ■ ■ ■ a„_j for < i < n. The word r l {w) is also well-defined 
for i > n and then r l (w) = r^(w) where j — i mod n. We define the ordered con- 
jugacy class of a word w G E n as [w] = (wi, . . . ,w n ) where Wi = r J_1 (w). It is 
convenient to think of [w] as a cycle of length n with a pointer to a distinguished 
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starting position. Every position i, 1 < i < n, on this cycle is labeled by cij. In par- 
ticular, ai is a successor of a n on this cycle since the position 1 is a successor of the 
position n. The mapping r moves the pointer to its predecessor. The (unordered) 
conjugacy class of w is the multiset {w±, . . . ,w n }. Whenever there is no confusion, 
then by abuse of notation we also write [w] to denote the (unordered) conjugacy 
class of w. For instance, this is the case if w is in some way distinguished within its 
conjugacy class, which is true if w is a Lyndon word. A Lyndon word is a non-empty 
word which is the unique lexicographic minimal element within its conjugacy class. 
More formally, let [w] = (w, W2, ■ ■ ■ , w n ), then w G S + is a Lyndon word if w < Wi 
for all i G {2, . . . , n}. Lyndon words have a lot of nice properties [15] . For instance, 
Lyndon words are primitive. Another interesting fact is the following. 

Fact 1 (Chen, Fox, and Lyndon [6]) Every word w G U + has a unique factor- 
ization w = v s ■ ■ ■ v i such that v% < ■ ■ ■ < v s is a non- decreasing sequence of Lyndon 
words. 

An alternative formulation of the above fact is that every word w has a unique 
factorization w = v™ B ■ ■ ■ v ™* where rij > 1 for all i and where Vi < ■ ■ ■ < v s is a 
strictly increasing sequence of Lyndon words. The factorization of w as in Fact Q] is 
called the Lyndon factorization of w. It can be computed in linear time using Duval's 
algorithm [9]. 

Suppose we are given a multiset V = {vi, . . . ,v s } of Lyndon words enumerated 
in non- decreasing order v\ < • • • < v s . Now, V uniquely determines the word w = 
v s ---v\. Therefore, the Lyndon factorization induces a one-to-one correspondence 
between arbitrary words of length n and multisets of Lyndon words of total length n. 
Of course, by definition of Lyndon words, the multiset {v i, . . . , v s } of Lyndon words 
and the multiset {[%],..., [v s ]} of conjugacy classes of Lyndon words are also in 
one-to-one correspondence. 

We extend the order < on E as follows to non-empty words. Let = www ■ ■ ■ 
be the infinite sequences obtained as the infinite power of w. For u, v G E + we 
write u < w v if either u u = or u u = paq and v u = pbr for p G £*, a, b G £ 
with a < b, and infinite sequences q, r; phrased differently, u < w v means that the 
infinite sequences and v u satisfy < v u . If u and v have the same length, then 
< w coincides with the lexicographic order induced by the order on the letters. For 
arbitrary words, < u is only a preorder since for example u < w uu and uu <^ u. On 
the other hand, if u < w v and v < u u then = f' u L Hence, by the periodicity 
lemma pi)], there exists a common root p G U + and g, h G N such that u = p 9 and 
v = p h . Also note that b < ba whereas ba < w b for a < b. 

Intuitively, the context of order k of w is the sequence of the first k letters of w. 
We want this notion to be well-defined even if \w\ < k. To this end let contextfc(w) 
be the prefix of length k of w u , i.e., contextfc(w) consists of the first k letters on the 
cycle [w] . Note that our definition of a context of order k is left-right symmetric to the 
corresponding notion used in data compression. This is due to the fact that typical 
compression schemes are applying the BWT or the ST to the reversal of the input. 

An important construction in this paper is the standard permutation tt w on the 
set of positions {1, . . . , n} induced by a word w — a\ • ■ ■ a n G E n [TT]. The first step 
is to introduce a new order -< on the positions of w by sorting the letters within w 
such that identical letters preserve their order. More formally, the linear order ^ on 
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{1, . . . , n} is defined as follows: % ^ j if 



a,- < a,- 



or 



a,- 



and i < 



J- 



Let ji -< • • • -< j n be the linearization of {1, . . . , n} according to this new order. Now, 
the standard permutation ir w is defined by ir w (i) = ji- 

Example 2. Consider the word w = bcbccbcbcabbaaba over the ordered alphabet a < 
b < c. We have \w\ = 16. Therefore, the positions in w are {1, . . . , 16}. For instance, 
the label of position 6 is A^(6) = b. Its Lyndon factorization is w = bcbcc ■ be ■ be ■ 
abb ■ aab ■ a. The context of order 7 of the prefix bcbcc of length 5 is bcbccbc and the 
context of order 7 of the factor be is bebebeb. For computing the standard permutation 
we write w column-wise, add positions, and then sort the pairs lexicographically: 



word w 


w with positions 


sorted 


b 


(M) 


(a, 10) 


c 


(c,2) 


(a, 13) 


b 


(6,3) 


(a, 14) 


c 


(c,4) 


(a, 16) 


c 


c,5) 


(M) 


b 


(6,6) 


(6,3) 


c 


(c,7) 


(6,6) 


b 


(6,8) 


(6,8) 


c 


(c,9) 


(6,11) 


a 


(a, 10) 


(6,12) 


b 


(6,11) 


(6, 15) 


b 


(6, 12) 


(c,2) 


a 


(a, 13) 


(c,4) 


a 


(a, 14) 


(c,5) 


b 


(6, 15) 


(c,7) 


a 


(a, 16) 


(c,9) 



This yields the standard permutation 



71",, 



1 2 3 4 5 6 7 
10 13 14 16 1 3 6 



9 10 11 12 13 14 15 16 
11 12 15 2 4 5 7 9 



The conjugacy class [w] of w is depicted in Figure 1(a) the i-th. word in [w] is written 



in the i-th. row. The last column of the matrix for [w] is the reversal w of w. 



3 The Burrows- Wheeler transform 

The Burrows-Wheeler transform (BWT) maps words w of length n to pairs (L,i) 
where L is a word of length n and i is an index in {1, . . . , n}. The word L is usually 
referred to as the Burrows- Wheeler transform of w. In particular, the BWT is not 
surjective. We will see below how the BWT works and that it is one-to-one. It follows 
that only a fraction of 1/n of all possible pairs (L, i) appears as an image under the 
BWT. For instance (bacd, 1) where a < b < c < d is not an image under the BWT. 

For w G S + we define M{w) = (wi, . . . , w n ) where {wi, . . . , w n } = [w] and 
Wi < ■ ■ ■ < w n . Now, the Burrows- Wheeler transform of w consists of the word 
BWT(mj) = last {w\) ■■ -last (w n ) and an index % such that w = Wi. Note that in 
contrast to the usual definition of the BWT, we are using right shifts; at this point 
this makes no difference but it unifies the presentation of succeeding transforms. At 
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b c b 
ab c 
b ab 
aba 
aab 
baa 
b b a 
abb 
cab 
•b c a 
c b c 
b cb 
c b c 
c cb 
bcc 
cb c 



c cb cb c 
b c cb cb 
cb c cb c 
b cb c cb 
ab cb c c 
b ab cb c 
ab ab cb 
aab ab c 
b aab ab 
b b aab a 
ab b aab 
c abb a a 
b c ab b a 
cb c abb 
b cb c ab 
cb cb c a 



abb 
cab 

c a 

b 

c 

b 

c 

c 

b 

c 
a b 
b ab 
aba 
aab 
baa 
b b a 



a ab a 
b aab 
b b a a 
abba 
c ab b 
b c ab 



cb c 

b c b 

cb c 

c b 

c c 

b c 

c b 

b c 

a b 

b a 



(a) Conjugacy class 



5 
4 
8 
2 
6 
3 
7 
10 
12 
1 
15 
9 
11 
13 
16 
14 



aab ab c 
ab ab cb 
ab b aab 
ab cb c c 
b aab ab 
ab cb c 
b aab a 
c abb a 
cb c ab 
cb c cb 
c cb cb 
ab b a a 
b c abb 
cb cb c a 
cb c cb c 
c cb cb c 



b c 
c c 
a b 
b c 
c b 
c b 
b c 



b c b 
cb c 
bcc 
cab 
cb c 
b c a 
c cb 



c ab b 
abba 
b cb c 
aab 
cab 
baa 



ab ab cb 
b aab ab 
cb c abb 
c abb a a 
b ab cb c 
aab ab c 
b b a a b a 
b c a b b a 
a b b a ab 



b c 
c b 
b c 
aab 
b ab 
cb c 
bcc 
bcb 
aba 
ab c 



7 
1 
10 
12 
15 
9 
11 
13 
16 
14 



a a 
b 



a 
a 
a 
b 
b 
b 
b 
b 
b 
b 
c 
c 
c 

c b 
c c 



b ab cb 
cb c cb 
ab cb c 
b aab a 
b cb c c 
ab ab c 
aab ab 
b c cb c 
abb a a 
b c abb 
cb cb c 
bb aab 
c ab b a 
cb c ab 
c cb cb 
b c b c a 



a b 



babe 
aab a 
abba 
ab c b 
ab ab 
b a ab 
c ab b 
b b a a 



b c ab b 
b b aab 
c abb a 
cb cb c 
ab b a a 
cb c ab 
b cb c a 
b aab a 
b c cb c 
b cb c c 
ab ab c 
c cb cb 
cb c cb 
ab cb c 
a ab ab 
b ab cb 



w 



(b) Lexicographically sorted (c) Sorted by 2-order contexts 



Figure 1. Computing the BWT and the ST of the word w = bcbccbcbcabbaaba 



first glance, it is surprising that one can reconstruct M(w) from BWT(io). Moreover, 
if we know the index % of w in the sorted list M(w), then we can reconstruct w from 
BWT(w). One way of how to reconstruct M(w) is presented in the following lemma. 
For later use, we prove a more general statement than needed for computing the 
inverse of the BWT. 

Lemma 3. Let k e N. Let Ui=i = {%,•••)%} ^ A7 + be a multiset built 
from conjugacy classes [vj\. Let M = (w±, . . . , w n ) satisfy contextfc(u>i) < • • • < 
contextfe(w„) and let L = last (u>i) •• -last (w n ) be the sequence of the last symbols. 
Then 

context fe (wi) = X L ir L (i) • X L 7r 2 L (i) • • • X L n k L {i) 
where -k 1 l denotes the t-fold application of itl and Xl'k^i) = Ai(7r^(i)). 

Proof. By induction over the context length t, we prove that for all i G {1, . . . ,n} 
we have context t (wj) = Xl^l^) • • • A_Lvr^(i). For t = we have context (ifi) = e and 
hence, the claim is trivially true. Let now < t < k. By the induction hypothesis, 
the (t — l)-order context of each Wi is A_L7r L (i) • • ■ Al7t^ _1 («). By applying one right- 
shift, we see that the t-order context of r(wi) is Xl(i) ■ \Lit\ j {i) • • • Xlt^ 1 (i) . 

The list M meets the sort order induced by A;-order contexts. In particular, 
(w 1 , . . . , w n ) is sorted by (t — l)-order contexts. Let (u±, . . . , u n ) be a stable sort by t- 
order contexts of the right-shifts (r(u>i), . . . , r(w n )). The construction of (u± : . . . , u n ) 
only requires a sorting of the first letters of (r(wi), . . . , r(w n )) such that identical 
letters preserve their order. The sequence of first letters of the words r(wi), . . . , r(w n ) 
is exactly L. By construction of n^, it follows that (u±, . . . , u n ) = (w nL m, . . . , w nL r n \). 
Since M is built from conjugacy classes, the multisets of elements occurring in 
(wi, . . . ,w n ) and (r(wi), . . . ,r(w n )) are identical. The same holds for the multisets 
induced by (wi, . . . , w n ) and (ui, . . . , u n ). Therefore, the sequences of t-order contexts 
induced by (w±, . . . w n ) and (ui, . . . , u n ) are identical. Moreover, we conclude 

context = context^) = context (uv lW ) = X L n L (i) ■ X L n 2 L (i) ■ ■ ■ A L 7r^(i) 

which completes the induction. We note that in general Ui ^ Wi since the sort order 
of M beyond /c-order contexts is arbitrary. Moreover, for t — k + 1 the property 
context t (-u7i) = context t (iij) does not need to hold (even though the multisets of 
(k + l)-order contexts coincide). □ 
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Note that in Lemma [3] we do not require that all Vi have the same length. Applying 
the BWT to conjugacy classes of words with different lengths has also been used for 
the Extended BWT [32]. 

Corollary 4. The BWT is invertible, i.e., given (BWT(u>),z) where i is the index 
of w in M(w) one can reconstruct the wordw. 

Proof. We set k = \w\. Let M = M(w) and L = BWT(w). Now, by Lemma [3]we see 
that 

w = Wi = context/^) = A L 7r^(i) • • • A L 7r[ L| (i). 

In particular, w = Al7t^(z) ■ ■ ■ \Lir^(i) only depends on L and i. □ 

Remark 5. In the special case of the BWT it is possible to compute the i-th element 
Wi of M(w) by using the inverse ttT 1 of the permutation tcl: 

This justifies the usual way of computing the inverse of (BWT(w), i) from right to left 
(by using the restriction of n^ 1 to the cycle containing the element i). The motivation 
is that the (required cycle of the) inverse n^ 1 seems to be easier to compute than the 
standard permutation n L . 

Example 6. We compute the BWT of w = bcbccbcbcabbaaba from Example [21 The 
lexicographically sorted list M(w) can be found in Figure 1(b) This yields the trans- 
form (BWT(w),i) = (bacbbaaccacbbcbb, 10) where L = BWT(ro) is the last column of 
the matrix M(w) and w is the i-th row in M(w). The standard permutation of L is 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

2 6 7 10 1 4 5 12 13 15 16 3 8 9 11 14 



Now, 7r^(10) • • • 7r^ 6 (10) gives us the following sequence of positions starting with 
tt l (10) = 15: 

15 ft 11 ft 16 K 14 n 9 K 13 n 8 ft 12 ft 3 ft 7 ft 5 K 1 K 2 K 6 £4 4 ft 10. 

Applying the labeling function Al to this sequence of positions yields 

A L (15)A L (11)A L (16)A L (14)A L (9)A L (13)A L (8)A L (12) 

•A L (3)A i (7)A L (5)A L (l)A L (2)A i (6)A L (4)A L (10) 
= bcbccbcbcabbaaba = w, 

i.e., we have successfully reconstructed the input w from (BWT(«j),i). 



4 The bijective Burrows- Wheeler transform 

Now we are ready to give a comprehensive description of Scott's bijective variant of the 
BWT and to prove its correctness. It maps a word of length n to a word of length n — 
without any index or end-of-string symbol being involved. The key ingredient is the 
Lyndon factorization: Suppose we are computing the BWT of a Lyndon word v , then 
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we do not need an index since we know that v is the first element of the list M(v). 
This leads to the computation of a multi-word BWT of the Lyndon factors of the 
input. 

The bijective BWT of a word w of length n is defined as follows. Let w = v s ■ ■ ■ v\ 
with v s > • ■ ■ > Vi be the Lyndon factorization of w. Let LM(w) = (ui, . . . , u n ) where 
U\ < w ■ ■ ■ < w u n and where the multiset {u±, . . . ,u n } = [J s i=1 [vi]. Then, the bijective 
BWT of w is BWTS(w) = last(«i) • • ■ last(u„). The S in BWTS is for Scottifted. Note 
that if w is a power of a Lyndon word, then BWTS(w) = BWT(w). 

In some sense, the bijective BWT can be thought of as the composition of the 
Lyndon factorization [6] with the inverse of the Gessel-Reutenauer transform [TT] . 
In particular, a first step towards a bijective BWT can be found in a 1993 article 
by Gessel and Reutenauer [TTJ (prior to the publication of the BWT [3]). The link 
between the Gessel-Reutenauer transform and the BWT was pointed out later by 
Crochemore et al. [8]. A similar approach as in the bijective BWT has been employed 
by Mantaci et al. [IS] ; instead of the Lyndon factorization they used a decomposition 
of the input into blocks of equal length. The output of this variant is a word and a 
sequence of indices (one for each block). In its current form, the bijective BWT has 
been proposed by Scott [23] in a newsgroup posting in 2007. Gil and Scott gave an 
accessible version of the transform, an independent proof of its correctness, and they 
tested its performance in data compression [12] . The outcome of these tests is that 
the bijective BWT beats the usual BWT on almost all files of the Calgary Corpus 
[I] by at least a few hundred bytes which exceeds the gain of just saving the rotation 
index. 

Lemma 7. Let w = v s ■ ■ ■ v± with v s > • • • > V\ be the Lyndon factorization of w, 
let LM(w) = (ui, . . . , u n ), and let L = BWTS(w). Consider the cycle C of the 
permutation ttl which contains the element 1 and let d be the length of C . Then 

\ L <Kl{l)..-\ L >K d L {l)=V 1 . 

Proof. By Lemma [3] we see that (Al7t^(1) • • • Alvt^(I)) = vf. Since v\ is primi- 
tive it follows Al7T£,(1) • • • Al7t^(1) = v\ for some z e N. In particular, the Lyndon 
factorization of w ends with vf. 

Let U be the subsequence of LM(to) which consists of those «j which come from 
this last factor vf. The sequence U contains each right-shift of V\ exactly z times. 
Moreover, the sort-order within U depends only on |wi|-order contexts. 

The element v\ = U\ is the first element in U since V\ is a Lyndon word. In 
particular, 7r£(l) = 1 is the first occurrence of r°(vi) = V\ within U. Suppose 7r£(l) 
is the first occurrence of r^iv-y) within U. Let 7r^(l) = %\ < ■ ■ ■ < i z be the indices of 
all occurrences of r^{v\) in U. By construction of n L , we have 7i£,(ii) < ■ • • < 7Ti(z 2 ) 
and therefore 7r{ +1 (l) is the first occurrence of r J+1 (-ui) within U . Inductively, 7r^(l) 
always refers to the first occurrence of r J (t>i) within U (for all j 6 N). In particular 
it follows that ir^(l) = 1 and z — 1. □ 

Theorem 8. The bijective BWT is invertible, i.e., given BWTS(w) one can recon- 
struct the word w. 

Proof. Let L = BWTS(w) and let w — v s • ■ -V\ with v s > ■ • ■ > v% be the Lyndon 
factorization of w. Each permutation admits a cycle structure. We decompose the 
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standard permutation n L into cycles Cx, ■ ■ ■ , C t . Let ij be the smallest element of the 
cycle Cj and let cL be the length of C,-. We can assume that 1 



%x < ■ ■ ■ < If 



We claim that t 



s, dj 



and Xi^L^j) ' ' ' ^l^l (ij) = v j- By Lemma [7] 
we have XL^Kii) • • ■Xl^'l^i) = v\. Let ir' L denote the restriction of ttl to the set 
C = C 2 U • • • U Ct, where by abuse of notation C 2 U • • • U C t denotes the set of all 
elements occurring in C2, ■ ■ ■ ,C t . Let V = BWTS(w s • • • v 2 ). The word V can be 
obtained from L by removing all positions occurring in the cycle C\. This yields a 
monotone bijection 

a:C^{l,...,\L'\} 

such that Xi(i) = A//a:(i) and air^i) = 7t^/q;(z) for all i (E C. In particular, tt^/ has 
the same cycle structure as n' L and 1 = 0(22) < • • • < a(i t ) is the sequence of the 
minimal elements within the cycles. By induction on the number of Lyndon factors, 



v s ■ ■ ■ v 2 



Appending A L 7r^(f 



\ L n\ v a{it 
\ v a>K l L {i t ) 

)---\ L Ttt(h 



1 



\ L >7ri,a{i2. 



■■■\L^i(l2) 



d 2 



t(h) 



■■Xyom^ttt) ■ ■ ■ XL'Otiri{i 2 ) 
^i(it) ■■■ Ai7ri(i 2 )---Ai7r2'^ 2 J 

v 1 to the last line allows us to reconstruct w by 



w 



\L^ L (it)---^i(it) ■■■ XL7T 1 L (i 1 )---\ L 7Tt(ix). 



Moreover, t = s and dj = \vj\. We note that this formula for w only depends on L 
and does not require any index to an element in LM(w). □ 

Example 9. We again consider the word w = bcbccbcbcabbaaba from Example [2] and 
its Lyndon factorization w — Vq ■ ■ ■ Vx where v 6 = bcbcc, v 5 = 6c, v 4 = be, v 3 
v 2 = aab, and v x = a. The lists ([i>i], • • • , [vq]) and LM(w) are: 



abb. 





(H,. 


..,N) 




LM(w) 


1 


a 




1 


a 


2 


aab 




2 


aab 


3 


baa 




4 


aba 


4 


aba 




5 


abb 


5 


abb 




3 


baa 


6 


b ab 




6 


b ab 


7 


b b a 




7 


b b a 


8 


b c 




8 


b c 


9 


c b 




10 


b c 


10 


b c 




12 


bcbcc 


11 


c b 




15 


b c c b c 


12 


b c b c 


c 


9 


c b 


13 


c b c b 


c 


11 


c b 


14 


c c b c 


b 


13 


c b c b c 


15 


b c c b 


c 


16 


c b c c b 


16 


c b c c 


b 


14 


c c b c b 



Hence, we obtain L = BWTS(u>) = abababaccccbbcbb as the sequence of the last 
symbols of the words in LM(w). The standard permutation ttl induced by L is 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
1 3 5 7 2 4 6 12 13 15 16 8 9 10 11 14 



The cycles of til arranged by their smallest elements are Cx = (1), C 2 = (2,3,5), 
C 3 = (4,7,6), Cx = (8,12), C 5 = (9,13), and C 6 = (10,15,11,16,14). Applying 
the labeling function A^ to the cycle Cj (starting with the second element) yields 
the Lyndon factor v^. With this procedure, we reconstructed w = v e ■ ■ - Vx from L = 
BWTS(w). 
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5 The sort transform 

The sort transform (ST) is a BWT where we only sort the conjugates of the input 
up to a given depth k and then we are using the index of the conjugates as a tie- 
breaker. Depending on the depth k and the implementation details this can speed up 
compression (while at the same time slightly slowing down decompression). 

In contrast to the usual presentation of the ST, we are using right shifts. This 
defines a slightly different version of the ST. The effect is that the order of the 
symbols occurring in some particular context is reversed. This makes sense, because 
in data compression the ST is applied to the reversal of a word. Hence, in the ST 
of the reversal of w the order of the symbols in some particular context is the same 
as in w. More formally, suppose w = x§ca\X\ca<iX2 • • -ca s x s for c G S + then in the 
sort transform of order |c| of w, the order of the occurrences of the letters a« is not 
changed. This property can enable better compression ratios on certain data. 

While the standard permutation is induced by a sequence of letters (i.e., a word) 
we now generalize this concept to sequences of words. For a list of non-empty words 
V = (vi, . . . , v n ) we now define the k-order standard permutation v k y induced by V. 
As for the standard permutation, the first step is the construction of a new linear 
order ■< on {1, . . . , n}. We define i z< j by the condition 

contextfc(fj) < context fc^) or context = context*^ j) and i < j. 

Let ji ~< ■ ■ ■ -< j n be the linearization of {1, . . . , n} according to this new order. The 
idea is that we sort the line numbers of v±, . . . ,v n by first considering the k-order 
contexts and, if these are equal, then use the line numbers as tie-breaker. As before, 
the linearization according to ^ induces a permutation u k y by setting u k y(i) = ji- 
Now, v k y(i) is the position of Vi if we are sorting V by A;-order context such that 
the line numbers serve as tie-breaker. We set M k (v±, . . . ,v n ) = (w±, . . . ,w n ) where 
w i — v u k v (i)- Now, we are ready to define the sort transform of order A; of a word w: 
Let Mk([w\) = (wi, . . . ,w n ); then STfe(w) = last(wi) • • • last(w n ), i.e., we first sort 
all cyclic right-shifts of w by their /c-order contexts (by using a stable sort method) 
and then we take the sequence of last symbols according to this new sort order as the 
image under ST*,. Since the tie-breaker relies on right-shifts, we have ST (u>) = w, i.e., 
ST is the reversal mapping. The /c-order sort transform of w is the pair (ST k (w),i) 
where i is the index of w in Mk([w]). As for the BWT, we see that the /c-order sort 
transform is not bijective. 

Next, we show that it is possible to reconstruct Mfc([iu]) from STfc(iu). Hence, 
it is possible to reconstruct w from the pair (ST k(w),i) where % is the index of w 
in Mfc ([«;]). The presentation of the back transform is as follows. First, we will in- 
troduce the k-order context graph G k and we will show that it is possible to re- 
build M k ([w\) from G k . Then we will show how to construct G k from ST k (w). Again, 
the approach will be slightly more general than required at the moment; but we will 
be able to reuse it in the presentation of a bijective ST. 

Let V = ([ui], . . . ,[u s ]) = (v±, . . . , v n ) be a list of words built from conjugacy 
classes [tij] of non-empty words Ui. Let M = (wi,...,w n ) be an arbitrary per- 
mutation of the elements in V. We are now describing the edge-labeled directed 
graph G k (M) - the /c-order context graph of M - which will be used later as a pre- 
sentation tool for the inverses of the ST and the bijective ST. The vertices of G k (M) 
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consist of all A;-order contexts contextfc(w) of words w occurring in M. We draw an 
edge (ci,i,c 2 ) from context c 1 to context c 2 labeled by i if c± — contextfc(wj) and 
c 2 = contextfc(r(wj)). Hence, every index i G {1, . . . , n} of M defines a unique edge 
in Gk(M). We can also think of last(wj) as an additional implicit label of the edge 
(ci,i,c 2 ), since c 2 = contextfc(last(wj)ci). 

A configuration (C,c) of the /c-order context graph G^{M) consists of a subset of 
the edges C and a vertex c. The idea is that (starting at context c) we are walking 
along the edges of Gk(M) and whenever an edge is used, it is removed from the set 
of edges C. We now define the transition 

(d, Cl ) A(C 2 ,c 2 ) 

from a configuration (Ci,Ci) to another configuration (C 2 ,c 2 ) with output u <E £* 
more formally. If there exists an edge in C\ starting at C\ and if (ci,i,c 2 ) G Ci is 
the unique edge with the smallest label % starting at ci, then we have the single-step 
transition 

(&, ci) A (Ci \ {(ci, i, c 2 )} , c 2 ) where a = last(iUj) 

If there is no edge in C\ starting at ci, then the outcome of (C±, C\) — > is undefined. 
Inductively, we define (Ci, Ci) — > (Ci, Ci) and for aG £ and ti G £* we have 

(d, Cl ) A (C 2 , c 2 ) if (d, Cl ) A (C, c') and (C, c') A (C 2 , c 2 ) 

for some configuration (C',c'). Hence, the reversal aw is the label along the path of 
length \au\ starting at configuration (Ci,ci). In particular, if (Ci,ci) A (C 2 ,c 2 ) holds, 
then it is possible to chase at least |u| transitions starting at (C±, ci); vice versa, if we 
are chasing £ transitions then we obtain a word of length £ as a label. We note that 
successively taking the edge with the smallest label comes from the use of right-shifts. 
If we had used left-shifts we would have needed to chase largest edges for the following 
lemma to hold. The reverse labeling of the big-step transitions is motivated by the 
reconstruction procedure which will work from right to left. 

Lemma 10. Let k G N, V = ([t>i], . . . , [v s ]), c% = context*;^), and G = Gk(M k (V)). 
Let Ci consist of all edges of G. Then 

(d, Cl ) A(C 2 , Cl ) 
(C 2 ,c 2 ) A(C 3 ,c 2 ) 

{C s ,c s ) — > (C s+ i,c s ). 

Proof. Let Mfc(V) = (w±, . . . , w n ). Consider some index i, 1 < i < s, and let 
(ui,...,u t ) = ([vi], . . . , Suppose that Ci consists of all edges of G except 

for those with labels Vk,v(J) ^ OT 1 — J — t- Let q — \vi\. We write V{ — a±- • -a q and 
u t+j = r J_1 (^), i.e., [vi] = (u t+1 , u t+q ). Starting with (C itl , c itl ) = (C, n q), we show 
that the sequence of transitions 

Q,l) ~~ >■ (Cj,2) c i,2) — > • • • (Ci jq ,Ci jq ) — > 

is defined. More precisely, we will see that the transition (Cjj, Qj) (Cjj+i, 
walks along the edge (q^, ^fc,y(i + j), Ci,j+i) an d hence indeed is labeled with the let- 
ter a q+ i_j = last(u t+ j) = last(w„ k v (t+j)) ■ Consider the context c it j. By induction, we 
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have c it j = contextk(u t+ j) and no edge with label Vk t y(£) for 1 < £ < t + j occurs 
in while all other labels do occur. In particular, (cj-j, Vk,v(t+j), c i,j+i) f° r c i,j+i — 
context k ( r ( u t+j)) = context k(u t +j+i) is an edge in dj (where contexts (r («£+.,■)) = 
contextfc(-u t+ j + i) only holds for j < q; we will consider the case j = q below). Suppose 
there were an edge (c it j, z, c') G C^j with z < Vk,v{t+j)- Then contextfc(u> z ) = c it j and 
hence, w z has the same fc-order context as w Uk v (t+j)- But in this case, in the construc- 
tion of Mk(V) we used the index in V as a tie-breaker. It follows u^ v (z) < t+ 1 which 
contradicts the properties of Cy. Hence, (qj, v k y{t + j), Qj+i) is the edge with the 
smallest label starting at context c i; j. Therefore, Cy+i = Cij\{(cij, v k y{t + j), Cjj + i)} 

and (Cij, Cij) aq -^ J (Cjj+i, Qj+i) indeed walks along the edge (c iJ; v k ,v(t + j), Qj+i)- 
It remains to verify that = Cj j?+ i, but this is clear since = contextfc("Ui +1 ) = 
contextfc(r 9 (ui + i)) = c^ q+ i. □ 

Lemma 11. Let k e N, V = ([«i], . . . , [«,]), M = M fc (V) = (iwi, . . .,w n ), and L = 
last(wi) • • -last(iy n ). Then it is possible to reconstruct Gk{M) from L. 

Proof. By Lemma[3]it is possible to reconstruct the contexts q = contextfc(u>j). This 
gives the vertices of the graph G fc (M). Write L = a\ ■ ■ ■ a n . For each i 6 {1, . . . , n} 
we draw an edge (q, i, contextfc(cijCj)). This yields the edges of Gk{M). □ 

Corollary 12. The k-order ST is invertible, i.e., given (STk(w),i) where i is the 
index of w in ([«;]) one can reconstruct the word w. 

Proof. The construction of w consists of two phases. First, by Lemma [TT] we can com- 
pute Gk(Mk([w])). By Lemma [3] we can compute c = contextfc(w) from (STk(w),i). 
In the second stage, we are using Lemma fTUI for reconstructing w by chasing 

(C, c) A (0, C ) 

where C consists of all edges in Gk(M k ([w])). □ 

Efficient implementations of the inverse transform rely on the fact that the k- 
order contexts of ([?/;]) are ordered. This allows the implementation of the /c-order 
context graph Gk in a vectorized form [1,19. 20,21]. 

Example 13. We compute the sort transform of order 2 of w = bcbccbcbcabbaaba from 



Example El The list M2([u>]) is depicted in Figure 1(c) This yields the transform 
(ST 2 (w),i) = (bbacabaacccbbcbb,8) where L = ST 2 (u>) is the last column of the ma- 
trix M 2 ([u>]) and w is the i-th element in M 2 ([w]). Next, we show how to reconstruct 
the input w from (L,i). The standard permutation induced by L is 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 
3 5 7 8 1 2 6 12 13 15 16 4 9 10 11 14 



Note that ir L has four cycles C x = (1,3,7,6,2,5), C 2 = (4,8,12), C 3 = (9,13), 
and C 4 = (10, 15, 11, 16, 14). We obtain the context of order 2 of the j-th word by 
Cj = \ L n L (i)\ L T{ 2 L (j) . In particular, c\ = aa, c 2 = c 3 = c 4 = ab, c 5 = c 6 = ba, c 7 = bb, 
cs = eg = cio = en = be, C12 = ca, C13 = C14 = C15 = cb, and c\§ = cc. With L and 
these contexts we can construct the graph G = G 2 (M 2 ([w]). The vertices of G are 
the contexts and the edge-labels represent positions in L. The graph G is depicted 
below: 
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We are starting at the context q = c§ = be and then we are traversing G along 
the smallest edge-label amongst the unused edges. The sequence of the edge labels 
obtained this way is 

(8, 2, 5, 3, 1, 6, 7, 4, 12, 9, 13, 10, 14, 16, 11, 15). 

The labeling of this sequence of positions yields W = abaabbacbcbccbcb. Since we are 
constructing the input from right to left, we obtain w = bcbccbcbcabbaaba. 

6 The bijective sort transform 

The bijective sort transform combines the Lyndon factorization with the ST. This 
yields a new algorithm which serves as a similar preprocessing step in data com- 
pression as the BWT. In a lot of applications, it can be used as a substitute for 
the ST. The proof of the bijectivity of the transform is slightly more technical than 
the analogous result for the bijective BWT. The main reason is that the bijective 
sort transform is less modular than the bijective BWT (which can be grouped into a 
'Lyndon factorization part' and a 'Gessel-Reutenauer transform part' and which for 
example allows the use of different orders on the alphabet for the different parts). 

For the description of the bijective ST and of its inverse, we rely on notions from 
Section [5j The bijective ST of a word w of length n is defined as follows. Let w = 
v s ■ • • v\ with v s > • • • > V\ be the Lyndon factorization of w. Let Mfc([i>i], . . . , [v s ]) = 
(ui, . . . ,u n ). Then the bijective ST of order k of w is LST fc (u>) = last(wi) • • Tast(w n ). 
That is, we are sorting the conjugacy classes of the Lyndon factors by fc-order contexts 
and then take the sequence of the last letters. The letter L in LST& is for Lyndon. 

Theorem 14. The bijective ST of order k is invertible, i.e., given LSTfc(w) one can 
reconstruct the word w. 

Proof. Let w = v s • ■ -V\ with v s > • • • > V\ be the Lyndon factorization of w, let 
Cj = context and let L = LSTfc(u>). By Lemma fTTl we can rebuild the fc-order 
context graph G = Gk(M k ([vi], . . . , [v s ])) = (wi, . . . , w n ) from L. Let C\ consist of all 
edges in G. Then by Lemma [TU] we see that 

(Ci,ci) ^ (C 2 ,ci) 
(C s , c s ) — > (C s+ i,c s ). 

We cannot use this directly for the reconstruction of w since we do not know the 
Lyndon factors and the contexts Cj. 
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The word V\ is the first element in the list Mfc([t>i], . . . , [v s ]) because v± is lexi- 
cographically minimal and it appears as the first element in the list ([i>i], ■ ■ • , [v s ])- 
Therefore, by Lemma [3] we obtain c\ = contextfc(fi) = \l^l(X) • • • Al7t£(1). 

The reconstruction procedure works from right to left. Suppose we have already 
reconstructed w'vj ■ ■ -V\ for j > with w' being a (possibly empty) suffix of Vj + \. 
Moreover, suppose we have used the correct contexts ci, . . . , Cj + \. Consider the con- 
figuration (C, d) defined by 

(Ci,c x ) A (C 2 ,ci) 

(Cj, cj) -4 (Cj+i, cj) 
(C i+ i,c i+ i) ^ (C',c') 

We assume that the following invariant holds: Cj+\ contains no edges (c", £, d") with 
c" < Cj+i. We want to rebuild the next letter. We have to consider three cases. First, 
if \w'\ < \vj + \\ then 

(C, d) A (C", c") 

yields the next letter a such that aw' is a suffix of Vj + %. Second, let \w'\ = \vj + i\ and 
suppose that there exists an edge (cj + i,£,d") G C starting at d = Cj + \. Then there 
exists a word v' in [fj+2], ■ ■ ■ , [v a ] such that context k(v') = Cj + \. If context ^(^+2) 7^ 
Cj + i then from the invariant it follows that context k{vj+2) > Cj+i — context^ (v'). This 
is a contradiction, since Vj + 2 is minimal among the words in [1^+2], ■ • • , [v s ]- Hence, 
contextfc(f J+2 ) = Cj + 2 = Cj + i and the invariant still holds for C J+2 = C. The last 
letter a of Vj +2 is obtained by 

(C, d) = (C j+2 , c j+2 ) A (C",d>). 

The third case is \w'\ = \vj + i\ and there is no edge (cj + %,£,d") G C starting at 
d = Cj + i. As before, Vj +2 is minimal among the (remaining) words in [fj+2], • • • , [v s ]- 
By construction of G, the unique edge (c", £, d") G C with the minimal label £ has the 
property that we = Vj+ 2 . In particular, c" = Cj +2 . Since Vj +2 is minimal, the invariant 
for Cj +2 = C is established. In this case, the last letter a of Vj +2 is obtained by 

{C j+2 ,c j+2 ) A (C",d"). 

We note that we cannot distinguish between the first and the second case since we do 
not know the length of fj+i, but in both cases, the computation of the next symbol is 
identical. In particular, in contrast to the bijective BWT we do not implicitly recover 
the Lyndon factorization of w. □ 

We note that the proof of TheoremfT^lheavily relies on two design criteria. The first 
one is to consider Mfc([vi], . . . , [v s ]) rather than Mk([v s ], . . . , [t>i]), and the second is to 
use right-shifts rather than left-shifts. The proof of Theorem [TH yields the following 
algorithm for reconstructing w from L = LSTfc(w): 

(1) Compute the /c-order context graph G = Gk and the /c-order context c\ of the 
last Lyndon factor of w. 
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(2) Start with the configuration (C, c) where C contains all edges of G and c := c\. 

(3) If there exists an outgoing edge starting at c in the set C, then 

— Let (c, £, c') be the edge with the minimal label I starting at c. 

— Output \ L (£). 

— Set C := C \ {(c, £, c')} and c := c'. 

— Continue with step (3). 

(4) If there is no outgoing edge starting at c in the set C, but C ^ 0, then 

— Let (c', £, c") G C be the edge with the minimal label I. 

— Output x L (e). 

— Set C := C \ {{c', i, c")} and c := c". 

— Continue with step (3). 

(5) The algorithm terminates as soon as C = 0. 

The sequence of the outputs is the reversal w of the word w. 

Example 15. We consider the word w = bcbccbcbcabbaaba from Example [2] and its 
Lyndon factorization w = vq ■ ■ ■ v± where v 6 = bcbcc, v 5 = be, v 4 = be, v 3 = a66, f 2 = 
aafe, and v 1 = a. For this particular word w the bijective Burrows- Wheeler transform 
and the bijective sort transform of order 2 coincide. From Example [HI we know L = 
LST 2 (u») = BWTS(u>) = abababaccccbbcbb and the standard permutation it^. As in 
Example US] we can reconstruct the 2-order contexts C\, . . . , Ci6 of M 2 ([fi], . . . , [^6]) : 
c\ = C2 = aa, C3 = C4 = ab, C5 = c% = ba, C7 = bb, cs = eg = C10 = en = be, 
C12 = C13 = C14 = C15 = c6, and c±q = cc. With L and the 2-order contexts we can 
construct the graph G = Gk(M 2 ([vi], . . . , [vq])): 




We are starting with the edge with label 1 and then we are traversing G along the 
smallest unused edges. If we end in a context with no outgoing unused edges, then 
we are continuing with the smallest unused edge. This gives the sequence (1, 2, 5, 3) 
after which we end in context aa with no unused edges available. Then we continue 
with the sequences (4, 6, 7) and (8, 12, 9, 13, 10, 14, 16, 11, 15). The complete sequence 
of edge labels obtained this way is 

(1,2,5,3, 4,6,7, 8,12,9,13,10,14,16,11,15) 

and the labeling of this sequence with yields w = abaabbacbcbccbcb. As for the 
ST, we are reconstructing the input from right to left, and hence we get w = 
bcbccbcbcabbaaba . 

7 Summary 

We discussed two bijective variants of the Burrows- Wheeler transform (BWT). The 
first one is due to Scott. Roughly speaking, it is a combination of the Lyndon fac- 
torization and the Gessel-Reuternauer transform. The second variant is derived from 
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the sort transform (ST); it is the main contribution of this paper. We gave full con- 
structive proofs for the bijectivity of both transforms. As a by-product, we provided 
algorithms for the inverse of the BWT and the inverse of the ST. For the latter, we in- 
troduced an auxiliary graph structure — the /c-order context graph. This graph yields 
an intermediate step in the computation of the inverse of the ST and the bijective 
ST. It can be seen as a generalization of the cycle decomposition of the standard 
permutation — which in turn can be used as an intermediate step in the computation 
of the inverse of the BWT and the bijective BWT. 
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